Stacked transformations for foreign accented speech recognition
نویسندگان
چکیده
Nowadays, large vocabulary speech recognizers exist that are performing reasonably well for specific conditions and environments. When the conditions change however, performance degrades quickly. For example, when the person to be recognized has a foreign accent the conditions could mismatch with the model, resulting in high error rates. The problem in recognizing foreign accented speech is the lack of sufficient training data. If enough data would be available of the same accent, from numerous different speakers, a well performing accented speech model could be built. Besides the lack of speech data, there are more problems with training a complete new model. It costs a lot of computational resources and storage space to train a new model. If speakers with different accents must be recognized, these costs explode as every accent needs retraining. A common solution for preventing retraining is to adapt (transform) an existing model, such that it better matches the recognition conditions. In this thesis multiple different adaptation transformations are considered. Speaker Transformations are using speech data from the target speaker, Accent Transformations use speech data from different speakers, who have the same accent as the speech that needs to be recognized. Neighbour Transformations are estimated with speech from different speakers that are automatically determined to be similar to the target speaker. Novelty in this work is the stack wise combination of these adaptations. Instead of using a single transformation, multiple transformations are ‘stacked together’. Because all adaptations except the speaker specific adaptation can be precomputed, no extra computational costs at recognition time occur compared to normal speaker adaptation and the adaptations that can be precomputed are much more refined as they can use more and better adaptation data. In addition, they need only a very small amount storage space, compared to a retrained model. The effect of Stacked Transformations is that the models have a better fit for the recognition utterances. When compared to no adaptation, improvements up to 30% in Word Error Rate can be achieved. In adaptation with a small number (5) of sentences, improvements up to 15% are gained.
منابع مشابه
The Role of Spectral Resolution in Foreign-Accented Speech Perception
Several studies have shown that diminished spectral resolution leads to poorer speech recognition in adverse listening conditions such as competing background noise or in cochlear implants. Although intelligibility is also reduced when the talker has a foreign accent, it is unknown how limited spectral resolution interacts with foreign-accent perception. It is hypothesized that limited spectral...
متن کاملComputational Modelling of the Recognition of Foreign-Accented Speech
In foreign-accented speech, pronunciation typically deviates from the canonical form to some degree. For native listeners, it has been shown that word recognition is more difficult for strongly-accented words than for less strongly-accented words. Furthermore recognition of strongly-accented words becomes easier with additional exposure to the foreign accent. In this paper, listeners’ behaviour...
متن کاملApproaches to foreign-accented speaker-independent speech recognition
Current research in the area of foreign-accented speech recognition focusses either on acoustic model adaptation or speakerdependent pronunciation variation modeling. In this paper both approaches are applied in parallel and in a speaker-independent fashion: the acoustic modeling part is based on a derived Hidden Markov Model (HMM) clustering algorithm and the lexicon adaptation is based on spe...
متن کاملEffect of foreign accent on speech recognition in the NATO n-4 corpus
We present results from a series of 151 speech recognition experiments based on the N4 corpus of accented English speech, using a small vocabulary recognition system. These experiments looked at the impact of foreign accent on speech recognition, both within non-native accented English and across different accents, with particular interest in using context free grammar technology to improve cal...
متن کاملSpeech-on-speech masking with variable access to the linguistic content of the masker speech for native and nonnative english speakers.
BACKGROUND Masking release for an English sentence-recognition task in the presence of foreign-accented English speech compared with native-accented English speech was reported in Calandruccio et al (2010a). The masking release appeared to increase as the masker intelligibility decreased. However, it could not be ruled out that spectral differences between the speech maskers were influencing th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011